Breaking AES-256 Bootloader

This tutorial will take you through a complete attack on an encrypted bootloader using AES-256. This demonstrates how to use side-channel power analysis on practical systems, along with discussing how to perform analysis with different Analyzer models.

Background

In the world of microcontrollers, a bootloader is a special piece of firmware that is made to let the user upload new programs into memory. This is especially useful for devices with complex code that may need to be patched or otherwise updated in the future - a bootloader makes it possible for the user to upload a patched version of the firmware onto the micro. The bootloader receives information from a communication line (a USB port, serial port, ethernet port, WiFi connection, etc...) and stores this data into program memory. Once the full firmware has been received, the micro can happily run its updated code.

There is one big security issue to worry about with bootloaders. A company may want to stop their customers from writing their own firmware and uploading it onto the micro. For example, this might be for protection reasons - hackers might be able to access parts of the device that weren't meant to be accessed. One way of stopping this is to add encryption. The company can add their own secret signature to the firmware code and encrypt it with a secret key. Then, the bootloader can decrypt the incoming firmware and confirm that the incoming firmware is correctly signed. Users will not know the secret key or the signature tied to the firmware, so they won't be able to "fake" their own.

This tutorial will work with a simple AES-256 bootloader. The victim will receive data through a serial connection, decrypt the command, and confirm that the included signature is correct. Then, it will only save the code into memory if the signature check succeeded. To make this system more robust against attacks, the bootloader will use cipher-block chaining (CBC mode). Our goal is to find the secret key and the CBC initialization vector so that we could successfully fake our own firmware.

Bootloader Communications Protocol

The bootloader's communications protocol operates over a serial port at 38400 baud rate. The bootloader is always waiting for new data to be sent in this example; in real life one would typically force the bootloader to enter through a command sequence.

Commands sent to the bootloader look as follows:

       |<-------- Encrypted block (16 bytes) ---------->|
       |                                                |
+------+------+------+------+------+------+ .... +------+------+------+
| 0x00 |    Signature (4 Bytes)    |  Data (12 Bytes)   |   CRC-16    |
+------+------+------+------+------+------+ .... +------+------+------+

This frame has four parts:

  • 0x00: 1 byte of fixed header
  • Signature: A secret 4 byte constant. The bootloader will confirm that this signature is correct after decrypting the frame.
  • Data: 12 bytes of the incoming firmware. This system forces us to send the code 12 bytes at a time; more complete bootloaders may allow longer variable-length frames.
  • CRC-16: A 16-bit checksum using the CRC-CCITT polynomial (0x1021). The LSB of the CRC is sent first, followed by the MSB. The bootloader will reply over the serial port, describing whether or not this CRC check was valid.

As described in the diagram, the 16 byte block is not sent as plaintext. Instead, it is encrypted using AES-256 in CBC mode. This encryption method will be described in the next section.

The bootloader responds to each command with a single byte indicating if the CRC-16 was OK or not:

            +------+
CRC-OK:     | 0xA1 |
            +------+

            +------+
CRC Failed: | 0xA4 |
            +------+

Then, after replying to the command, the bootloader veries that the signature is correct. If it matches the expected manufacturer's signature, the 12 bytes of data will be written to flash memory. Otherwise, the data is discarded.

Details of AES-256 CBC

The system uses the AES algorithm in Cipher Block Chaining (CBC) mode. In general one avoids using encryption 'as-is' (i.e. Electronic Code Book), since it means any piece of plaintext always maps to the same piece of ciphertext. Cipher Block Chaining ensures that if you encrypted the same thing a bunch of times it would always encrypt to a new piece of ciphertext.

You can see another reference on the design of the encryption side; we'll be only talking about the decryption side here. In this case AES-256 CBC mode is used as follows, where the details of the AES-256 Decryption block will be discussed in detail later:

AES-256

This diagram shows that the output of the decryption is no longer used directly as the plaintext. Instead, the output is XORed with a 16 byte mask, which is usually taken from the previous ciphertext. Also, the first decryption block has no previous ciphertext to use, so a secret initialization vector (IV) is used instead. If we are going to decrypt the entire ciphertext (including block 0) or correctly generate our own ciphertext, we'll need to find this IV along with the AES key.

Attacking AES-256

The system in this tutorial uses AES-256 encryption, which has a 256 bit (32 byte) key - twice as large as the 16 byte key we've attacked in previous tutorials. This means that our regular AES-128 CPA attacks won't quite work. However, extending these attacks to AES-256 is fairly straightforward: the theory is explained in detail in Extending AES-128 Attacks to AES-256.

As the theory page explains, our AES-256 attack will have 4 steps:

  1. Perform a standard attack (as in AES-128 decryption) to determine the first 16 bytes of the key, corresponding to the 14th round encryption key.
  2. Using the known 14th round key, calculate the hypothetical outputs of each S-Box from the 13th round using the ciphertext processed by the 14th round, and determine the 16 bytes of the 13th round key manipulated by inverse MixColumns.
  3. Perform the MixColumns and ShiftRows operation on the hypothetical key determined above, recovering the 13th round key.
  4. Using the AES-256 key schedule, reverse the 13th and 14th round keys to determine the original AES-256 encryption key.

Firmware

For this tutorial, we'll be using the bootloader-aes256 project, which we'll build as usual:

In [1]:
PLATFORM = "CWLITEARM"
CRYPTO_TARGET="NONE"
In [2]:
%%bash -s "$PLATFORM" "$CRYPTO_TARGET"
cd ../../hardware/victims/firmware/bootloader-aes256
make PLATFORM=$1 CRYPTO_TARGET=$2
rm -f -- bootloader-aes256-CWLITEARM.hex
rm -f -- bootloader-aes256-CWLITEARM.eep
rm -f -- bootloader-aes256-CWLITEARM.cof
rm -f -- bootloader-aes256-CWLITEARM.elf
rm -f -- bootloader-aes256-CWLITEARM.map
rm -f -- bootloader-aes256-CWLITEARM.sym
rm -f -- bootloader-aes256-CWLITEARM.lss
rm -f -- objdir/*.o
rm -f -- objdir/*.lst
rm -f -- bootloader.s aes256.s crcccitt.s simpleserial.s stm32f3_hal.s stm32f3_hal_lowlevel.s stm32f3_sysmem.s
rm -f -- bootloader.d aes256.d crcccitt.d simpleserial.d stm32f3_hal.d stm32f3_hal_lowlevel.d stm32f3_sysmem.d
rm -f -- bootloader.i aes256.i crcccitt.i simpleserial.i stm32f3_hal.i stm32f3_hal_lowlevel.i stm32f3_sysmem.i
mkdir objdir 
mkdir .dep
.
-------- begin --------
arm-none-eabi-gcc (15:6.3.1+svn253039-1build1) 6.3.1 20170620
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

.
Compiling C: bootloader.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/bootloader.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/bootloader.o.d bootloader.c -o objdir/bootloader.o 
.
Compiling C: aes256.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/aes256.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/aes256.o.d aes256.c -o objdir/aes256.o 
.
Compiling C: crcccitt.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/crcccitt.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/crcccitt.o.d crcccitt.c -o objdir/crcccitt.o 
.
Compiling C: .././simpleserial/simpleserial.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/simpleserial.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/simpleserial.o.d .././simpleserial/simpleserial.c -o objdir/simpleserial.o 
.
Compiling C: .././hal/stm32f3/stm32f3_hal.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/stm32f3_hal.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/stm32f3_hal.o.d .././hal/stm32f3/stm32f3_hal.c -o objdir/stm32f3_hal.o 
.
Compiling C: .././hal/stm32f3/stm32f3_hal_lowlevel.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/stm32f3_hal_lowlevel.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/stm32f3_hal_lowlevel.o.d .././hal/stm32f3/stm32f3_hal_lowlevel.c -o objdir/stm32f3_hal_lowlevel.o 
.
Compiling C: .././hal/stm32f3/stm32f3_sysmem.c
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/stm32f3_sysmem.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/stm32f3_sysmem.o.d .././hal/stm32f3/stm32f3_sysmem.c -o objdir/stm32f3_sysmem.o 
.
Assembling: .././hal/stm32f3/stm32f3_startup.S
arm-none-eabi-gcc -c -mcpu=cortex-m4 -I. -x assembler-with-cpp -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -DF_CPU=7372800 -Wa,-gstabs,-adhlns=objdir/stm32f3_startup.lst -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ .././hal/stm32f3/stm32f3_startup.S -o objdir/stm32f3_startup.o
.
Linking: bootloader-aes256-CWLITEARM.elf
arm-none-eabi-gcc -mcpu=cortex-m4 -I. -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fmessage-length=0 -ffunction-sections -gdwarf-2 -DSS_VER=SS_VER_1_1 -DSTM32F303xC -DSTM32F3 -DSTM32 -DDEBUG -DHAL_TYPE=HAL_stm32f3 -DPLATFORM=CWLITEARM -DF_CPU=7372800UL -Os -funsigned-char -funsigned-bitfields -fshort-enums -Wall -Wstrict-prototypes -Wa,-adhlns=objdir/bootloader.o -I.././simpleserial/ -I.././hal -I.././hal/stm32f3 -I.././hal/stm32f3/CMSIS -I.././hal/stm32f3/CMSIS/core -I.././hal/stm32f3/CMSIS/device -I.././hal/stm32f4/Legacy -I.././crypto/ -std=gnu99 -MMD -MP -MF .dep/bootloader-aes256-CWLITEARM.elf.d objdir/bootloader.o objdir/aes256.o objdir/crcccitt.o objdir/simpleserial.o objdir/stm32f3_hal.o objdir/stm32f3_hal_lowlevel.o objdir/stm32f3_sysmem.o objdir/stm32f3_startup.o --output bootloader-aes256-CWLITEARM.elf --specs=nano.specs -T .././hal/stm32f3/LinkerScript.ld -Wl,--gc-sections -lm -Wl,-Map=bootloader-aes256-CWLITEARM.map,--cref   -lm  
.
Creating load file for Flash: bootloader-aes256-CWLITEARM.hex
arm-none-eabi-objcopy -O ihex -R .eeprom -R .fuse -R .lock -R .signature bootloader-aes256-CWLITEARM.elf bootloader-aes256-CWLITEARM.hex
.
Creating load file for EEPROM: bootloader-aes256-CWLITEARM.eep
arm-none-eabi-objcopy -j .eeprom --set-section-flags=.eeprom="alloc,load" \
--change-section-lma .eeprom=0 --no-change-warnings -O ihex bootloader-aes256-CWLITEARM.elf bootloader-aes256-CWLITEARM.eep || exit 0
.
Creating Extended Listing: bootloader-aes256-CWLITEARM.lss
arm-none-eabi-objdump -h -S -z bootloader-aes256-CWLITEARM.elf > bootloader-aes256-CWLITEARM.lss
.
Creating Symbol Table: bootloader-aes256-CWLITEARM.sym
arm-none-eabi-nm -n bootloader-aes256-CWLITEARM.elf > bootloader-aes256-CWLITEARM.sym
Size after:
   text	   data	    bss	    dec	    hex	filename
   5912	      8	   2208	   8128	   1fc0	bootloader-aes256-CWLITEARM.elf
+--------------------------------------------------------
+ Built for platform CW-Lite Arm (STM32F3)
+--------------------------------------------------------

Capturing Traces

Setup

To start, we'll proceed with setup as usual:

In [3]:
%run "Helper_Scripts/CWLite_Connect.ipynb"
In [4]:
%run "Helper_Scripts/Setup_Target_Generic.ipynb"
In [5]:
# uncomment based on your target
fw_path = "../../hardware/victims/firmware/bootloader-aes256/bootloader-aes256-CWLITEARM.hex"
#%run "Helper_Scripts/Program_XMEGA.ipynb"
%run "Helper_Scripts/Program_STM.ipynb"
#%run "Helper_Scripts/No_Programmer.ipynb"
In [6]:
program_target(scope, fw_path)
Detected known STMF32: STM32F302xB(C)/303xB(C)
Extended erase (0x44), this can take ten seconds or more
Attempting to programming 5919 bytes at 0x8000000
STM32F Programming flash...
STM32F Reading flash...
Verified flash OK, 5919 bytes

Calculating the CRC

The next step we'll need to take in attacking this target is to communicate with it. Most of the transmission is fairly straight forward, but the CRC is a little tricky. Luckily, there's a lot of open source out there for calculating CRCs. In this case, we'll pull some code from pycrc:

In [7]:
# Class Crc
#############################################################
# These CRC routines are copy-pasted from pycrc, which are:
# Copyright (c) 2006-2013 Thomas Pircher <tehpeh@gmx.net>
#
class Crc(object):
    """
    A base class for CRC routines.
    """

    def __init__(self, width, poly):
        """The Crc constructor.

        The parameters are as follows:
            width
            poly
            reflect_in
            xor_in
            reflect_out
            xor_out
        """
        self.Width = width
        self.Poly = poly


        self.MSB_Mask = 0x1 << (self.Width - 1)
        self.Mask = ((self.MSB_Mask - 1) << 1) | 1

        self.XorIn = 0x0000
        self.XorOut = 0x0000

        self.DirectInit = self.XorIn
        self.NonDirectInit = self.__get_nondirect_init(self.XorIn)
        if self.Width < 8:
            self.CrcShift = 8 - self.Width
        else:
            self.CrcShift = 0

    def __get_nondirect_init(self, init):
        """
        return the non-direct init if the direct algorithm has been selected.
        """
        crc = init
        for i in range(self.Width):
            bit = crc & 0x01
            if bit:
                crc ^= self.Poly
            crc >>= 1
            if bit:
                crc |= self.MSB_Mask
        return crc & self.Mask


    def bit_by_bit(self, in_data):
        """
        Classic simple and slow CRC implementation.  This function iterates bit
        by bit over the augmented input message and returns the calculated CRC
        value at the end.
        """
        # If the input data is a string, convert to bytes.
        if isinstance(in_data, str):
            in_data = [ord(c) for c in in_data]

        register = self.NonDirectInit
        for octet in in_data:
            for i in range(8):
                topbit = register & self.MSB_Mask
                register = ((register << 1) & self.Mask) | ((octet >> (7 - i)) & 0x01)
                if topbit:
                    register ^= self.Poly

        for i in range(self.Width):
            topbit = register & self.MSB_Mask
            register = ((register << 1) & self.Mask)
            if topbit:
                register ^= self.Poly

        return register ^ self.XorOut
    
bl_crc = Crc(width = 16, poly=0x1021)

Now we can easily get the CRC for our message by calling bl_crc.bit_by_bit(message).

Communicating with the Bootloader

With that done, we can start communicating with the bootloader. Recall that the bootloader expects:

  • To start with 0x00
  • A 16 byte encrypted message (4 bytes signature + 12 bytes data)
  • CRC16

We don't really care what the 16 byte message is (just that each is different so that we get a variety of hamming weights), so we'll use the same text/key module from earlier attacks.

We can now run the following block, and we should get 0xA4 back. You may need to run this block a few times to get the right response back.

In [8]:
import time
okay = 0
while not okay:
    target.ser.write("\0xxxxxxxxxxxxxxxxxx")
    time.sleep(0.005)
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if response:
        if ord(response[0]) == 0xA1:
            okay = 1
In [9]:
from chipwhisperer.capture.acq_patterns.basic import AcqKeyTextPattern_Basic
import time
message = [0x00]
ktp = AcqKeyTextPattern_Basic(target=target)

# clear serial buffer
num_char = target.ser.inWaiting()
print(target.ser.read(num_char))

key, text = ktp.newPair() #don't care about key here
message.extend(text)

crc = bl_crc.bit_by_bit(text)

message.append(crc >> 8)
message.append(crc & 0xFF)

target.ser.write(message)
time.sleep(0.1)

num_char = target.ser.inWaiting()
response = target.ser.read(num_char)
print("Response: {:02X}".format(ord(response[0])))
Response: A4

Capturing Traces

With that out of the way, we can proceed to capturing our traces. The normal 5000 traces we capture isn't long enough to get the rounds we care about, so we'll need to increase it (11000 should be fine):

In [10]:
scope.adc.samples = 11000

We'll be working with Analyzer, so we'll need to use a ChipWhisperer project to store our traces and text:

In [11]:
project = cw.createProject("projects/Tutorial_A5.cwp", overwrite=True)
tc = project.getTraceFormat()
ktp = AcqKeyTextPattern_Basic(target=target)

Below you'll find our capture loop. This will be pretty similar to Tutorial B5, but we've added our communication code. We also check the response and just skip the data we get if it isn't correct.

In [12]:
#Capture Traces
from tqdm import tqdm
from chipwhisperer.capture.acq_patterns.basic import AcqKeyTextPattern_Basic
import numpy as np
import time
keys = []
N = 100  # Number of traces
target.init()
for i in tqdm(range(N), desc='Capturing traces'):
    message = [0x00]
    
    num_char = target.ser.inWaiting()
    target.ser.read(num_char)
    
    key, text = ktp.newPair()  # manual creation of a key, text pair can be substituted here
    keys.append(key)
    
    message.extend(text)
    
    crc = bl_crc.bit_by_bit(text)
    message.append(crc >> 8)
    message.append(crc & 0xFF)

    # run aux stuff that should run before the scope arms here

    scope.arm()

    # run aux stuff that should run after the scope arms here

    target.ser.write(message)
    timeout = 50
    # wait for target to finish
    while target.isDone() is False and timeout:
        timeout -= 1
        time.sleep(0.01)

    try:
        ret = scope.capture()
        if ret:
            print('Timeout happened during acquisition')
    except IOError as e:
        print('IOError: %s' % str(e))

    # run aux stuff that should happen after trace here
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if ord(response[0]) != 0xA4:
        # Bad response, just skip
        print("Bad response: {:02X}".format(ord(response[0])))
        continue
    
    tc.addTrace(scope.getLastTrace(), text, "", key)
    
tc._isloaded = True
project.traceManager().appendSegment(tc)
Capturing traces: 100%|██████████| 100/100 [00:17<00:00,  4.97it/s]

Analysis

Now that we have our traces, we can go ahead and perform the attack. As described in the background theory, we'll have to do two attacks - one to get the 14th round key, and another (using the first result) to get the 13th round key. Then, we'll do some post-processing to finally get the 256 bit encryption key.

14th Round Key

We can attack the 14th round key with a standard, no-frills CPA attack (using the inverse sbox, since it's a decryption that we're breaking):

In [13]:
import chipwhisperer as cw
from chipwhisperer.analyzer.attacks.cpa import CPA
from chipwhisperer.analyzer.attacks.cpa_algorithms.progressive import CPAProgressive
from chipwhisperer.analyzer.attacks.models.AES128_8bit import AES128_8bit, InvSBox_output

tm = project.traceManager()

attack = CPA()
leak_model = AES128_8bit(InvSBox_output)
attack.setAnalysisAlgorithm(CPAProgressive, leak_model)
attack.setTraceSource(tm)
attack.setTraceStart(0)
attack.setTracesPerAttack(tm.numTraces())
attack.setIterations(1)
attack.setReportingInterval(10)
attack.setTargetSubkeys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])

With the setup done, we can actually preform the attack. 11000 samples is a rather large amount to chew through, so if you want a faster attack you can use a smaller range in attack.setPointRange(). (2900, 4200) will work for XMEGA, while (1400, 2600) will work for the STM32F3 (CWLite ARM).

In [14]:
key = [0xea, 0x79, 0x79, 0x20, 0xc8, 0x71, 0x44, 0x7d, 0x46, 0x62, 0x5f, 0x51, 0x85, 0xc1, 0x3b, 0xcb]

import pandas as pd
def format_stat(stat):
    return str("{:02X}<br>{:.3f}".format(stat[0], stat[2]))

def color_corr_key(row):
    global key
    ret = [""] * 16
    for i,bnum in enumerate(row):
        if bnum[0] == key[i]:
            ret[i] = "color: red"
        else:
            ret[i] = ""
    return ret

from IPython.display import clear_output
import numpy as np
        
def stats_callback():
    attack_results = attack.getStatistics()
    attack_results.setKnownkey(key)
    stat_data = attack_results.findMaximums()
    df = pd.DataFrame(stat_data).transpose()
    clear_output(wait=True)
    display(df.head().style.format(format_stat).apply(color_corr_key,axis=1))
    
attack.setPointRange((1400, 2600))
attack_results = attack.processTracesNoGUI(stats_callback)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 EA
0.873
79
0.885
79
0.854
20
0.909
C8
0.877
71
0.887
44
0.859
7D
0.893
46
0.897
62
0.918
5F
0.894
51
0.883
85
0.885
C1
0.884
3B
0.902
CB
0.878
1 EE
0.446
5D
0.454
41
0.443
AE
0.438
FD
0.476
7A
0.442
2F
0.486
A2
0.480
98
0.465
B5
0.447
1C
0.472
9B
0.429
3B
0.436
BC
0.436
DF
0.448
5D
0.454
2 9A
0.426
7C
0.449
5C
0.430
EC
0.420
B0
0.472
9A
0.435
EF
0.415
94
0.459
24
0.437
9D
0.428
70
0.437
5B
0.426
23
0.419
0A
0.421
20
0.424
50
0.449
3 36
0.419
67
0.448
27
0.420
F5
0.419
93
0.447
17
0.435
9E
0.411
02
0.447
1C
0.430
8B
0.428
C6
0.420
B9
0.417
2C
0.419
4D
0.419
62
0.414
6E
0.441
4 8D
0.419
F3
0.439
F4
0.413
D1
0.410
B6
0.446
25
0.432
D4
0.409
04
0.431
AE
0.417
92
0.414
42
0.419
1F
0.416
7C
0.418
1B
0.412
AF
0.412
75
0.430
In [15]:
rec_key = []
for bnum in attack_results.findMaximums():
    rec_key.append(bnum[0][0])

13th Round Key

Analyzer doesn't have a leakage model for the 13th round key built in, so we'll need to create our own. An example class is shown below along with the beginning of the setup. NOTE: You'll need to update calc_round_key with the key you found in the last step

In [16]:
import chipwhisperer as cw
from chipwhisperer.analyzer.attacks.cpa import CPA
from chipwhisperer.analyzer.attacks.cpa_algorithms.progressive import CPAProgressive
from chipwhisperer.analyzer.attacks.models.AES128_8bit import AES128_8bit, AESLeakageHelper
from chipwhisperer.analyzer.preprocessing.resync_sad import ResyncSAD

class AES256_Round13_Model(AESLeakageHelper):
    def leakage(self, pt, ct, guess, bnum):
        #You must put YOUR recovered 14th round key here - this example may not be accurate!
        calc_round_key = [0xea, 0x79, 0x79, 0x20, 0xc8, 0x71, 0x44, 0x7d, 0x46, 0x62, 0x5f, 0x51, 0x85, 0xc1, 0x3b, 0xcb]
        xored = [calc_round_key[i] ^ pt[i] for i in range(0, 16)]
        block = xored
        block = self.inv_shiftrows(block)
        block = self.inv_subbytes(block)
        block = self.inv_mixcolumns(block)
        block = self.inv_shiftrows(block)
        result = block
        return self.inv_sbox((result[bnum] ^ guess[bnum]))
    
attack = CPA()
leak_model = AES128_8bit(AES256_Round13_Model)
attack.setAnalysisAlgorithm(CPAProgressive, leak_model)
attack.setTraceSource(tm)

Resyncing Traces (XMEGA Only)

The traces for the XMEGA version of the firmware become desynced around sample 7000. This is due to a non-constant AES implementation: the code does not always take the same amount of time to run for every input. (It's actually possible to do a timing attack on this AES implementation! We'll stick with our CPA attack for now.)

While this does open up a timing attack, it actually makes our AES attack a little harder, since we'll have to resync the traces. Luckily, this can be done pretty easily by using the ResyncSAD preprocessing module:

In [17]:
"""resync_traces = ResyncSAD(tm)
resync_traces.enabled = True
resync_traces.ref_trace = 0
resync_traces.target_window = (9100, 9300)
resync_traces.max_shift = 200
attack.setTraceSource(resync_traces)"""
Out[17]:
'resync_traces = ResyncSAD(tm)\nresync_traces.enabled = True\nresync_traces.ref_trace = 0\nresync_traces.target_window = (9100, 9300)\nresync_traces.max_shift = 200\nattack.setTraceSource(resync_traces)'

Running the Attack

Like in the 14th round attack, we can use a smaller range of points to make the attack faster. (8000,10990) works well for the XMEGA, while (6500, 8500) works well for the STM32F3.

In [18]:
attack.setTraceStart(0)
attack.setTracesPerAttack(tm.numTraces())
attack.setIterations(1)
attack.setReportingInterval(10)
attack.setTargetSubkeys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
attack.setPointRange((6500,8500))
attack_results = attack.processTracesNoGUI(stats_callback)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 C6
0.893
BD
0.892
4E
0.873
50
0.895
AB
0.900
CA
0.897
75
0.853
77
0.914
79
0.927
87
0.880
96
0.894
CA
0.897
1C
0.885
7F
0.891
C5
0.879
82
0.893
1 D6
0.452
DB
0.471
78
0.452
F0
0.462
CA
0.473
BE
0.452
BE
0.452
23
0.461
2C
0.448
23
0.443
CE
0.472
3B
0.511
E3
0.479
89
0.508
F3
0.425
45
0.439
2 7A
0.437
F0
0.432
9F
0.448
0C
0.436
9F
0.438
96
0.452
37
0.442
21
0.458
33
0.429
A0
0.434
C8
0.462
A7
0.462
1D
0.458
41
0.474
4E
0.424
EA
0.427
3 DD
0.436
6A
0.427
B2
0.438
76
0.432
64
0.415
13
0.426
46
0.441
84
0.439
43
0.422
5C
0.432
0E
0.429
83
0.459
15
0.456
D3
0.431
3E
0.422
49
0.426
4 22
0.401
B7
0.421
4D
0.432
E6
0.413
A5
0.412
A7
0.421
4C
0.435
54
0.425
53
0.412
71
0.431
6C
0.427
45
0.456
9C
0.428
0C
0.429
BD
0.416
6D
0.413

You can run the block below and the correct key should be printed out:

In [19]:
rec_key2 = []
for bnum in attack_results.findMaximums():
    print("Best Guess = 0x{:02X}, Corr = {}".format(bnum[0][0], bnum[0][2]))
    rec_key2.append(bnum[0][0])
Best Guess = 0xC6, Corr = 0.8933992360456808
Best Guess = 0xBD, Corr = 0.8916443217443042
Best Guess = 0x4E, Corr = 0.873121304492336
Best Guess = 0x50, Corr = 0.894970035139991
Best Guess = 0xAB, Corr = 0.8999579800543815
Best Guess = 0xCA, Corr = 0.897014915878979
Best Guess = 0x75, Corr = 0.8526863998669161
Best Guess = 0x77, Corr = 0.913856657524775
Best Guess = 0x79, Corr = 0.9268860543192148
Best Guess = 0x87, Corr = 0.8795612343711698
Best Guess = 0x96, Corr = 0.8939060644919773
Best Guess = 0xCA, Corr = 0.8968155520080009
Best Guess = 0x1C, Corr = 0.884507210211917
Best Guess = 0x7F, Corr = 0.8911909328378526
Best Guess = 0xC5, Corr = 0.8792472910465613
Best Guess = 0x82, Corr = 0.8929822750520692

This, however, isn't actually the 13th round key. To get the real 13th round key, we'll need to run what we've recovered through a shiftrows() and mixcolumns() operation:

In [20]:
from chipwhisperer.analyzer.attacks.models.aes.funcs import shiftrows,mixcolumns
    
real_key2 = shiftrows(rec_key2)
real_key2 = mixcolumns(real_key2)

print("Recovered:", end="")
for subkey in real_key2:
    print(" {:02X}".format(subkey), end="")
print("")
Recovered: C6 6A A6 12 4A BA 4D 04 4A 22 03 54 5B 28 0E 63

We now have everything we need to recover the full key! We'll start by combining the 13th and 14th round keys:

In [21]:
rec_key_comb = real_key2.copy()
rec_key_comb.extend(rec_key)

print("Key:", end="")
for subkey in rec_key_comb:
    print(" {:02X}".format(subkey), end="")
print("")
Key: C6 6A A6 12 4A BA 4D 04 4A 22 03 54 5B 28 0E 63 EA 79 79 20 C8 71 44 7D 46 62 5F 51 85 C1 3B CB

and then we can use the AES128_8bit leakage model to recover the first two rounds:

In [22]:
btldr_key = leak_model.keyScheduleRounds(rec_key_comb, 13, 0)
btldr_key.extend(leak_model.keyScheduleRounds(rec_key_comb, 13, 1))
print("Key:", end="")
for subkey in btldr_key:
    print(" {:02X}".format(subkey), end="")
print("")
Key: 94 28 5D 4D 6D CF EC 08 D8 AC DD F6 BE 25 A4 99 C4 D9 D0 1E C3 40 7E D7 D5 28 D4 09 E9 F0 88 A1

You should see a 32 byte key printed out. Open supersecret.h, confirm that we have the right key, and celebrate!

Recovering the IV

Now that we have the encryption key, we can proceed onto an attack of the next secret value: the IV.

Here, we have the luxury of seeing the source code of the bootloader. This is generally not something we would have access to in the real world, so we'll try not to use it to cheat. (Peeking at supersecret.h counts as cheating.) Instead, we'll use the source to help us identify important parts of the power traces.

Bootloader Source Code

Inside the bootloader's main loop, it does three tasks that we're interested in:

  • it decrypts the incoming ciphertext;
  • it applies the IV to the decryption's result; and
  • it checks for the signature in the resulting plaintext.

This snippet from bootloader.c shows all three of the tasks:

// Continue with decryption
trigger_high();                
aes256_decrypt_ecb(&ctx, tmp32);
trigger_low();

// Apply IV (first 16 bytes)
for (i = 0; i < 16; i++){
    tmp32[i] ^= iv[i];
}

//Save IV for next time from original ciphertext                
for (i = 0; i < 16; i++){
    iv[i] = tmp32[i+16];
}

// Tell the user that the CRC check was okay
putch(COMM_OK);
putch(COMM_OK);

//Check the signature
if ((tmp32[0] == SIGNATURE1) &&
   (tmp32[1] == SIGNATURE2) &&
   (tmp32[2] == SIGNATURE3) &&
   (tmp32[3] == SIGNATURE4)){

   // Delay to emulate a write to flash memory
   _delay_ms(1);
}

This gives us a pretty good idea of how the microcontroller is going to do its job, but if you'd like to go further, you can open the .lss file for the binary that was built. This is called a listing file and it lets you see the assembly that the C was compiled and linked to.

Power Traces

As you can see from both files, after the decryption process, the bootloader executes a few distinct pieces of code:

  • To apply the IV, it uses an XOR operation;
  • To store the new IV, it copies the previous ciphertext into the IV array;
  • It sends two bytes on the serial port;
  • It checks the bytes of the signature one by one.

We should be able to recognize these four parts of the code in the power traces. Let's modify our capture routine to find them:

  1. We're looking for the original IV, but it's overwritten after each successful decryption. This means we'll have to reset the target before each trace we capture
  2. We'd like to skip over all of the decryption process. Recall that the trigger pin is set low after the decryption finishes. This means we can skip over the AES-256 function by triggering on a falling edge instead
  3. Depending on the target, we may have to flush the target's serial lines by sending it a bunch of invalid data and looking for a bad CRC return. This slows down the capture process by a lot, so you may want to try without doing this first.
  4. We won't need as many samples, so we can reduce how many we capture. 3000 should be sufficient for most targets.

Let's start by reducing our samples and making a function to reset our target (depending on your target, you may need to change the reset pin):

In [23]:
import time
scope.adc.samples = 3000
def reset_target(scope):
    scope.io.nrst = 'low'
    #scope.io.pdic = 'low'
    time.sleep(0.05)
    scope.io.nrst = 'high'
    #scope.io.pdic = 'high'

We can trigger on a falling edge by changing scope.adc.basic_mode to "falling_edge":

In [24]:
scope.adc.basic_mode = "falling_edge"

We can flush the serial line by sending an invalid message, then checking for a bad CRC return value (0xA1). Let's make sure our changes work by getting a trace:

In [25]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from chipwhisperer.capture.acq_patterns.basic import AcqKeyTextPattern_Basic
reset_target(scope)
message = [0x00]


num_char = target.ser.inWaiting()
target.ser.read(num_char)

key, text = ktp.newPair()  # manual creation of a key, text pair can be substituted here

message.extend(text)

crc = bl_crc.bit_by_bit(text)
message.append(crc >> 8)
message.append(crc & 0xFF)

#flush target's serial
okay = 0
while not okay:
    target.ser.write("\0xxxxxxxxxxxxxxxxxx")
    time.sleep(0.005)
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if response:
        if ord(response[0]) == 0xA1:
            okay = 1

scope.arm()

target.ser.write(message)
timeout = 50
# wait for target to finish
while target.isDone() is False and timeout:
    timeout -= 1
    time.sleep(0.01)

try:
    ret = scope.capture()
    if ret:
        print('Timeout happened during acquisition')
except IOError as e:
    print('IOError: %s' % str(e))

# run aux stuff that should happen after trace here
num_char = target.ser.inWaiting()
response = target.ser.read(num_char)
if ord(response[0]) != 0xA4:
    # Bad response, just skip
    print("Bad response: {:02X}".format(ord(response[0])))


trace = scope.getLastTrace()


output_notebook()
p = figure()

xrange = range(len(trace))
p.line(xrange, trace, line_color="red")
show(p)
Loading BokehJS ...

You should see 5 different sections:

  • 16 XORs
  • 16 register loads (this is the new IV being copied over)
  • Some serial communication
  • The signature check
  • The serial line going idle

Different targets have different power traces (for example, on Arm the XORs and register loads are almost identical), but hopefully you can pick out where each section is. For example, on XMEGA:

XMEGA_Bonus_Trace

With all of these things clearly visible, we have a pretty good idea of how to attack the IV and the signature. We should be able to look at each of the XOR spikes to find each of the IV bytes - each byte is processed on its own. Then, the signature check uses a short-circuiting comparison: as soon as it finds a byte in error, it stops checking the remaining bytes. This type of check is susceptible to a timing attack.

With those things done, we can move onto our capture loop. It's pretty similar to our last one. We're done with Analyzer, so we can store our traces in Python lists (we'll convert to numpy arrays later for easy analysis).

In [26]:
from tqdm import tqdm
from chipwhisperer.capture.acq_patterns.basic import AcqKeyTextPattern_Basic
import numpy as np
import time
traces = []
keys = []
plaintexts = []
N = 250  # Number of traces
target.init()
for i in tqdm(range(N), desc='Capturing traces'):
    reset_target(scope)
    message = [0x00]
    

    num_char = target.ser.inWaiting()
    target.ser.read(num_char)
    
    key, text = ktp.newPair()  # manual creation of a key, text pair can be substituted here
    keys.append(key)
    plaintexts.append(text)
    
    message.extend(text)
    
    crc = bl_crc.bit_by_bit(text)
    message.append(crc >> 8)
    message.append(crc & 0xFF)

    # run aux stuff that should run before the scope arms here
    
    #flush target's serial
    okay = 0
    while not okay:
        target.ser.write("\0xxxxxxxxxxxxxxxxxx")
        time.sleep(0.005)
        num_char = target.ser.inWaiting()
        response = target.ser.read(num_char)
        if response:
            if ord(response[0]) == 0xA1:
                okay = 1
    scope.arm()

    # run aux stuff that should run after the scope arms here

    target.ser.write(message)
    timeout = 50
    # wait for target to finish
    while target.isDone() is False and timeout:
        timeout -= 1
        time.sleep(0.01)

    try:
        ret = scope.capture()
        if ret:
            print('Timeout happened during acquisition')
            continue
    except IOError as e:
        print('IOError: %s' % str(e))

    # run aux stuff that should happen after trace here
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if ord(response[0]) != 0xA4:
        # Bad response, just skip
        print("Bad response: {:02X}".format(ord(response[0])))
        continue
    
    traces.append(scope.getLastTrace())
Capturing traces: 100%|██████████| 250/250 [02:42<00:00,  1.23it/s]

Analysis

Attack Theory

The bootloader applies the IV to the AES decryption result by calculating

$\text{PT} = \text{DR} \oplus \text{IV}$

where DR is the decrypted ciphertext, IV is the secret vector, and PT is the plaintext that the bootloader will use later. We only have access to one of these: since we know the AES-256 key, we can calculate DR. This exclusive or should be visible in the power traces

This is enough information for us to attack a single bit of the IV. Suppose we only wanted to get the first bit (number 0) of the IV. We could do the following:

  • Split all of the traces into two groups: those with DR[0] = 0, and those with DR[0] = 1.
  • Calculate the average trace for both groups.
  • Find the difference between the two averages. It should include a noticeable spike during the first iteration of the loop.
  • Look at the direction of the spike to decide if the IV bit is 0 (PT[0] = DR[0]) or if the IV bit is 1 (PT[0] = ~DR[0]).

This is effectively a DPA attack on a single bit of the IV. We can repeat this attack 128 times to recover the entire IV.

A 1-Bit Attack

Recall that we're looking for the xor operation between the last decrypted block, so we'll need to decrypt it up to that point. The PyCrypto includes an AES decyprtion routine, so we'll be using that. We'll start by importing the necessary modules and converting our traces/plaintext to numpy arrays:

In [27]:
from Crypto.Cipher import AES
import numpy as np

trace_array = np.asarray(traces)  # if you prefer to work with numpy array for number crunching
textin_array = np.asarray(plaintexts)

numTraces = len(trace_array)
traceLen = len(trace_array[0])

Next we'll do the AES256 decryption. If you got a different key in the earlier part, you'll need to change knownkey.

In [28]:
knownkey = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99,
            0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1]

knownkey = bytes(knownkey)
dr = []
aes = AES.new(knownkey, AES.MODE_ECB)
for i in range(numTraces):
    ct = bytes(textin_array[i])
    pt = aes.decrypt(ct)
    d = [bytearray(pt)[i] for i in range(16)]
    dr.append(d)

Now, let's split the traces into two groups by comparing bit 0 of the DR:

In [29]:
groupedTraces = [[] for _ in range(2)]
for i in range(numTraces):
    bit0 = dr[i][0] & 0x01
    groupedTraces[bit0].append(trace_array[i])
print(len(groupedTraces[0]))
118

If you have 1000 traces, you should expect this to print a number around 500 - roughly half of the traces should fit into each group. Now, NumPy's average function lets us easily calculate the average at each point:

In [30]:
# Find averages and differences
means = []
for i in range(2):
    means.append(np.average(groupedTraces[i], axis=0))
diff = means[1] - means[0]

Finally, we can plot this difference to see if we can spot the IV:

In [31]:
# Split traces into 2 groups
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()
p = figure()

xrange = range(len(diff))
xrange2 = range(len(traces[0]))
p.line(xrange, diff, line_color="red")
#p.line(xrange2, traces[0], line_color='blue')
show(p)
Loading BokehJS ...

You should see a few visible spikes. We're looking for the XOR for byte 0 here, so any later spikes won't be the XOR. Use bokeh's zoom functionality to pinpoint all the largest spikes and record their sample location. You'll probably need to record a few: only one is the correct spike, but we won't be able to tell until we repeat this with other bytes. For example, you might have spikes at 37, 41, and 45. Make sure you record all these values. These peaks won't all be above 0, so make sure you're looking at both positive and negative values.

Next, we'll need to repeat this with a few more bytes. To make things easier, the necessary code has been combined into the below block. Increment the 0 in bit0 = dr[i][0] & 0x01 to other numbers to attack other bytes. Attacking bytes 0 through 3 should be sufficient.

In [32]:
groupedTraces = [[] for _ in range(2)]
for i in range(numTraces):
    bit0 = dr[i][0] & 0x01
    groupedTraces[bit0].append(trace_array[i])
print(len(groupedTraces[0]))

# Find averages and differences
means = []
for i in range(2):
    means.append(np.average(groupedTraces[i], axis=0))
diff = means[1] - means[0]

# Split traces into 2 groups
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()
p = figure()

xrange = range(len(diff))
xrange2 = range(len(traces[0]))
p.line(xrange, diff, line_color="red")
show(p)
118
Loading BokehJS ...

Now that you have some peak data, you'll want to use this to find the time shift between XORs. This time shift should be constant between samples and needs to work for all samples (each run through the loop is the same, so it makes sense that the time shift should be constant). For example, you might have:

0th byte @ 37, 41
1st byte @ 77, 81
2nd byte @ 105, 117, 121
3rd byte @ 141, 157, 161
4th byte @ 197, 201

With this data, peaks at 41, 81, 121, 161, and 201 have a constant time shift of 40. This means the location of the XORs is 41 + 40 * byte#

The Other 127

The best way to attack the IV would be to repeat the 1-bit conceptual attack for each of the bits. Try to do this yourself! (Really!) If you're stuck, here are a few hints to get you going:

One easy way of looping through the bits is by using two nested loops, like this:

for byte in range(16):
    for bit in range(8):
        # Attack bit number (byte*8 + bit)

The sample that you'll want to look at will depend on which byte you're attacking. We had success when we used location = 51 + byte*60, but your mileage will vary.

The bitshift operator and the bitwise-AND operator are useful for getting at a single bit:

# This will either result in a 0 or a 1
checkIfBitSet = (byteToCheck >> bit) & 0x01

If you're really, really stuck, the end of this tutorial has a working script. After finding the IV, check supersecret.h and verify that your attack was successful.

In [33]:
btldr_IV = [0] * 16
for byte in range(16):
    location = 41 + byte * 40
    iv = 0
    for bit in range(8):
        pt_bits = [((dr[i][byte] >> (7-bit)) & 0x01) for i in range(numTraces)]

        # Split traces into 2 groups
        groupedPoints = [[] for _ in range(2)]
        for i in range(numTraces):
            groupedPoints[pt_bits[i]].append(trace_array[i][location])
            
        means = []
        for i in range(2):
            means.append(np.average(groupedPoints[i]))
        diff = means[1] - means[0]
        
        iv_bit = 1 if diff > 0 else 0
        iv = (iv << 1) | iv_bit
        
        print(iv_bit, end = " ")
        
    print("{:02X}".format(iv))
    btldr_IV[byte] = iv
    
print(btldr_IV)
1 1 0 0 0 0 0 1 C1
0 0 1 0 0 1 0 1 25
0 1 1 0 1 0 0 0 68
1 1 0 1 1 1 1 1 DF
1 1 1 0 0 1 1 1 E7
1 1 0 1 0 0 1 1 D3
0 0 0 1 1 0 0 1 19
1 1 0 1 1 0 1 0 DA
0 0 0 1 0 0 0 0 10
1 1 1 0 0 0 1 0 E2
0 1 0 0 0 0 0 1 41
0 1 1 1 0 0 0 1 71
0 0 1 1 0 0 1 1 33
1 0 1 1 0 0 0 0 B0
1 1 1 0 1 0 1 1 EB
0 0 1 1 1 1 0 0 3C
[193, 37, 104, 223, 231, 211, 25, 218, 16, 226, 65, 113, 51, 176, 235, 60]

Attacking the Signature

The last thing we can do with this bootloader is attack the signature. This final section will show how one byte of the signature could be recovered. If you want more of this kind of analysis, a more complete timing attack is shown in Tutorial B3-1 Timing Analysis with Power for Password Bypass.

Attack Theory

Recall from earlier that the signature check in C looks like:

if ((tmp32[0] == SIGNATURE1) &&
    (tmp32[1] == SIGNATURE2) &&
    (tmp32[2] == SIGNATURE3) &&
    (tmp32[3] == SIGNATURE4)){

In C, boolean expressions support short-circuiting. When checking multiple conditions, the program will stop evaluating these booleans as soon as it can tell what the final value will be. In this case, unless all four of the equality checks are true, the result will be false. Thus, as soon as the program finds a single false condition, it's done.

Open the listing file for your binary (.lss), find the signature check, and confirm that this is happening. For example, on the STM32F3, the assembly looks like this:

//Check the signature
                if ((tmp32[0] == SIGNATURE1) &&
 8000338:   f89d 3018   ldrb.w  r3, [sp, #24]
 800033c:   2b00        cmp r3, #0
 800033e:   d1c2        bne.n   80002c6 <main+0x52>
 8000340:   f89d 2019   ldrb.w  r2, [sp, #25]
 8000344:   2aeb        cmp r2, #235    ; 0xeb
 8000346:   d1be        bne.n   80002c6 <main+0x52>
                   (tmp32[1] == SIGNATURE2) &&
 8000348:   f89d 201a   ldrb.w  r2, [sp, #26]
 800034c:   2a02        cmp r2, #2
 800034e:   d1ba        bne.n   80002c6 <main+0x52>
                   (tmp32[2] == SIGNATURE3) &&
 8000350:   f89d 201b   ldrb.w  r2, [sp, #27]
 8000354:   2a1d        cmp r2, #29
 8000356:   d1b6        bne.n   80002c6 <main+0x52>
                   (tmp32[3] == SIGNATURE4)){

This assembly code confirms the short-circuiting operation. Each of the four assembly blocks include a comparison and a conditional branch. All four of the conditional branches (bne.n) return the program to the same location (the start of the while(1) loop). All four branches must fail to get into the body of the if block.

The short-circuiting conditions are perfect for us. We can use our power traces to watch how long it takes for the signature check to fail. If the check takes longer than usual, then we know that the first byte of our signature was right.

Power Traces

Our capture loop will be pretty similar to the one we used to break the IV, but now that we know the secret values of the encryption process we can make some improvements by encrypting the text that we send. This has two important advantages:

  1. We can control the signature. We could reuse the traces we took during the IV attack, but this way ensures that we hit each possible value once. It also simplifies the analysis, since we don't have to worry about decrypting the text we sent.
  2. We no longer have to reset after each attempt, since we know what the next IV is going to be (we do need to reset at the beginning to make sure we're on the same starting IV as the target). This speeds up the capture process considerably.

To perform the AES256 CBC encryption, there's a few steps we need to take:

  1. XOR the IV with the text we want to send
  2. Encrypt this new text
  3. Set this cipher text as the new IV

We can use PyCrypto again to make the encryption process easy and the other two steps are simple operations. We'll run our loop 256 times (one for each possible byte value) and assign that value to the byte we want to check. We're not quite sure where the check is happening, so we'll be safe and capture 24000 traces. Everthing else should look familiar from earlier parts of the tutorial:

In [34]:
from tqdm import tqdm
import numpy as np
from Crypto.Cipher import AES
import time

traces = []
keys = []
plaintexts = []

iv = [0xC1, 0x25, 0x68, 0xDF, 0xE7, 0xD3, 0x19, 0xDA, 0x10, 0xE2, 0x41, 0x71, 0x33, 0xB0, 0xEB, 0x3C]

knownkey = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99,
            0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1]

knownkey = bytes(knownkey)
aes = AES.new(knownkey, AES.MODE_ECB)
N = 256 # Number of traces

reset_target(scope)
okay=0
scope.adc.basic_mode = "falling_edge"
while not okay:
    target.ser.write("\0xxxxxxxxxxxxxxxxxx")
    time.sleep(0.005)
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if response:
        if ord(response[0]) == 0xA1:
            okay = 1

scope.adc.samples = 24000
scope.adc.offset = 0
target.init()
for byte in tqdm(range(N), desc='Attacking Signature Byte'):
    message = [0x00]
    text = [0] * 16
    
    # the 4 signature bytes
    text[0] = byte
    text[1] = 0
    text[2] = 0
    text[3] = 0
    

    num_char = target.ser.inWaiting()
    target.ser.read(num_char)
    
    textcpy = [0] * 16
    textcpy[:] = text[:]
    plaintexts.append(textcpy)
    
    # Apply IV
    for i in range(len(iv)):
        text[i] ^= iv[i]
    
    # Encrypt text
    ct = aes.encrypt(bytes(text))
    
    message.extend(ct)
    
    # Use ct as new IV
    iv[:] = ct[:]
    
    crc = bl_crc.bit_by_bit(ct)
    message.append(crc >> 8)
    message.append(crc & 0xFF)

    # run aux stuff that should run before the scope arms here

    scope.arm()

    # run aux stuff that should run after the scope arms here

    target.ser.write(message)
    timeout = 50
    # wait for target to finish
    while target.isDone() is False and timeout:
        timeout -= 1
        time.sleep(0.01)

    try:
        ret = scope.capture()
        if ret:
            print('Timeout happened during acquisition')
            continue
    except IOError as e:
        print('IOError: %s' % str(e))

    # run aux stuff that should happen after trace here
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if ord(response[0]) != 0xA4:
        # Bad response, just skip
        print("Bad response: {:02X}".format(ord(response[0])))
        continue
    
    traces.append(scope.getLastTrace())
Attacking Signature Byte: 100%|██████████| 256/256 [00:46<00:00,  5.42it/s]

Analysis

Now that we've captured our traces, the actual analysis is pretty simple. We're looking for a single trace that looks very different from the rest. A simple way to find this is to compare all the traces to a reference trace. We'll use the average of all the traces as our reference:

In [35]:
mean = np.average(traces, axis=0)

That leaves us with comparing the traces. Let's start by plotting the difference between some of the traces and the mean:

In [36]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook

output_notebook()
p = figure()
colors = ["red", "blue", "green", "yellow"]
for i in range(0,10):
    p.line(range(len(traces[i])), traces[i]-mean, line_color=colors[i%4])
        
show(p)
Loading BokehJS ...

Depending on your target, you might have seen something like this:

Looks like we've found our trace! However, let's clean this up with some statistics. We can use the correlation coefficient to see which bytes are the furthest away from the average. We only want to take the correlation across where the plots differ, chose a subset of the plot where there's a large difference. In the case of the above picture, the difference starts at around 18k, and continues until the end. A range of 18000 to 20000 should work nicely:

In [37]:
corr = []
for i in range(256):
    corr.append(np.corrcoef(mean[18000:20000], traces[i][18000:20000])[0, 1])
print(np.sort(corr))
print(np.argsort(corr))
[0.38655112 0.99823866 0.99855735 0.99867205 0.99892545 0.99899505
 0.99899775 0.99907795 0.99909101 0.99909919 0.99913682 0.99916286
 0.99918754 0.99921001 0.99923019 0.9992457  0.99924864 0.99927149
 0.99929271 0.99930348 0.99931348 0.99933237 0.99933408 0.99933642
 0.99935235 0.99938373 0.99939167 0.99939264 0.99939516 0.99940017
 0.99940083 0.9994117  0.99941693 0.99942279 0.99942316 0.99943074
 0.99944895 0.99947055 0.99948625 0.99948832 0.99949295 0.99950266
 0.99950285 0.99951352 0.99951495 0.99952458 0.99955355 0.99955374
 0.99955892 0.99955905 0.99956135 0.99956154 0.99956249 0.99956431
 0.99956888 0.99956955 0.99957073 0.99957346 0.9995766  0.99958163
 0.99958281 0.99958821 0.99959275 0.99959348 0.99959828 0.99960004
 0.99960327 0.99960816 0.99961064 0.99962326 0.99962809 0.99962971
 0.99963159 0.99963706 0.99964055 0.99964257 0.99964496 0.99964652
 0.99964892 0.9996492  0.99964969 0.99965019 0.99965043 0.99965053
 0.99965782 0.9996582  0.99966005 0.9996609  0.99966152 0.99966408
 0.99966476 0.99966663 0.99966725 0.99966919 0.99967373 0.99967519
 0.99967768 0.99967888 0.99967909 0.99968268 0.99968273 0.99968399
 0.99968443 0.99968459 0.99968586 0.99968645 0.99968703 0.99969089
 0.999691   0.99969152 0.99969425 0.99969586 0.99969867 0.99969989
 0.99970456 0.99970773 0.99970801 0.99970823 0.99970996 0.9997137
 0.9997142  0.99971949 0.99972046 0.9997223  0.99972493 0.99972633
 0.99972796 0.99972844 0.99972867 0.99973319 0.99973493 0.99973661
 0.99973678 0.99973715 0.99973727 0.99973896 0.99974188 0.99974558
 0.99974606 0.99974726 0.99974898 0.99975092 0.99975104 0.99975138
 0.99975175 0.99975292 0.99975305 0.99975321 0.99975673 0.99975834
 0.99975874 0.99976015 0.99976077 0.99976327 0.99976391 0.99976567
 0.9997666  0.99976667 0.99976765 0.99976772 0.99976834 0.99977025
 0.99977406 0.99977578 0.9997766  0.99977689 0.99977819 0.99977945
 0.9997797  0.99977999 0.99978023 0.9997815  0.99978252 0.99978412
 0.99978434 0.99978457 0.99978477 0.9997868  0.99978859 0.99978892
 0.99978921 0.99979124 0.99979277 0.99979298 0.99979507 0.99979586
 0.99979613 0.99979648 0.99979788 0.99980116 0.99980183 0.99980413
 0.99980455 0.99980658 0.99980667 0.99980765 0.9998081  0.9998101
 0.99981187 0.99981312 0.99981547 0.99981601 0.99981811 0.99981824
 0.99981864 0.99981977 0.999821   0.99982186 0.99982187 0.99982428
 0.99982438 0.99982604 0.99982698 0.999827   0.999827   0.99982876
 0.99982936 0.99983082 0.99983322 0.99983335 0.99983348 0.99983378
 0.9998342  0.99983593 0.99983834 0.99983854 0.99984074 0.99984133
 0.99984163 0.99984229 0.9998425  0.99984269 0.99984327 0.99984338
 0.99984342 0.99984416 0.99984435 0.99984627 0.99984642 0.99984755
 0.99984871 0.99984873 0.99985013 0.9998525  0.99985491 0.9998551
 0.9998574  0.99985854 0.99986028 0.99986147 0.99986189 0.99986272
 0.9998629  0.99987368 0.99987581 0.99989051]
[  0 203 106   4  26 204 108  82  73   7   9 195 179   5 250 202  18  41
 110 196  39 155   2   1  37 153 162  88 107  30 129  16  19 229 206 207
  38 154 251 145  98 146  31 209 237  87 230  79 231 187 255 174  13 105
 149  34  29 200  21 192 212  59 253 243  67  24 140 102   3 175 142 193
 205 216 172  57  97  15 118 213   8 119 211  90  20  44 131  32  78 164
 125 182 180 158 208  94 238 194 227  48 141  63 123 128 236  91  76 225
  62 197  23  56  81 228 240 157  28 235 147 245  17 178  58 156 189 150
 210  46 234 135  47  93 199   6 113 165  51 132 121 137  85 177 233 191
 168 115 252 159  80 190  96 183 201 136  43  40 226 148 151 122 188 134
  60  54  74  27 242 126 143  22 221 198 218 239  65 144  84 254 171 170
 101  36 246 184 160 139  12 249 166 247 114 103  99  72  49  70 130 124
  83  66 100  33 224 217 111  25 109  64 219  45 220 161  89 169 127 214
 241  55 222  68  42 116 244 185  14  10 163  69  95 215  77 181 223 186
 173  53 104  75 152 138 248  35 176 232 112 120  61  11 167  71  50 133
 117  86  92  52]

This output tells us two things:

  • The first list says that almost every trace looks very similar to the overall mean (98% correlated or higher). However, there's one trace that is totally different, with 68% correlation. This is probably our correct guess.
  • The second list gives the signature guess that matches each of the above correlations. The first number in the list is 0x00, which is the correct signature!

To finish this attack, change the capture loop to keep the first byte fixed and vary the second byte instead. Repeat this with the rest of the bytes and you should have the signature.

In [38]:
from tqdm import tqdm
import numpy as np
from Crypto.Cipher import AES
import time

traces = []
keys = []
plaintexts = []
btldr_sig = [0] * 4

iv = [0xC1, 0x25, 0x68, 0xDF, 0xE7, 0xD3, 0x19, 0xDA, 0x10, 0xE2, 0x41, 0x71, 0x33, 0xB0, 0xEB, 0x3C]

knownkey = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99,
            0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1]

knownkey = bytes(knownkey)
aes = AES.new(knownkey, AES.MODE_ECB)
N = 256 # Number of traces

reset_target(scope)
okay=0
scope.adc.basic_mode = "falling_edge"
while not okay:
    target.ser.write("\0xxxxxxxxxxxxxxxxxx")
    time.sleep(0.005)
    num_char = target.ser.inWaiting()
    response = target.ser.read(num_char)
    if response:
        if ord(response[0]) == 0xA1:
            okay = 1
            
scope.adc.samples = 24000
scope.adc.offset = 0
target.init()
for bnum in range(4):
    traces = []
    for byte in tqdm(range(N), desc='Attacking Signature Byte {}'.format(bnum)):
        message = [0x00]
        text = [0] * 16

        # the 4 signature bytes
        for j in range(bnum):
            text[j] = btldr_sig[j]
        text[bnum] = byte


        num_char = target.ser.inWaiting()
        target.ser.read(num_char)

        textcpy = [0] * 16
        textcpy[:] = text[:]
        plaintexts.append(textcpy)

        # Apply IV
        for i in range(len(iv)):
            text[i] ^= iv[i]

        # Encrypt text
        ct = aes.encrypt(bytes(text))

        message.extend(ct)

        # Use ct as new IV
        iv[:] = ct[:]

        crc = bl_crc.bit_by_bit(ct)
        message.append(crc >> 8)
        message.append(crc & 0xFF)

        # run aux stuff that should run before the scope arms here

        scope.arm()

        # run aux stuff that should run after the scope arms here

        target.ser.write(message)
        timeout = 50
        # wait for target to finish
        while target.isDone() is False and timeout:
            timeout -= 1
            time.sleep(0.01)

        try:
            ret = scope.capture()
            if ret:
                print('Timeout happened during acquisition')
                continue
        except IOError as e:
            print('IOError: %s' % str(e))

        # run aux stuff that should happen after trace here
        num_char = target.ser.inWaiting()
        response = target.ser.read(num_char)
        if ord(response[0]) != 0xA4:
            # Bad response, just skip
            print("Bad response: {:02X}".format(ord(response[0])))
            continue

        traces.append(scope.getLastTrace())
        
    mean = np.average(traces, axis=0)
    corr = []
    for i in range(256):
        corr.append(np.corrcoef(mean[18000:20000], traces[i][18000:20000])[0, 1])
    btldr_sig[bnum] = np.argsort(corr)[0]
Attacking Signature Byte 0: 100%|██████████| 256/256 [00:51<00:00,  3.99it/s]
Attacking Signature Byte 1: 100%|██████████| 256/256 [00:47<00:00,  6.49it/s]
Attacking Signature Byte 2: 100%|██████████| 256/256 [00:55<00:00,  3.38it/s]
Attacking Signature Byte 3: 100%|██████████| 256/256 [00:49<00:00,  6.60it/s]
In [39]:
scope.dis()
target.dis()

Conclusion

We've now successfully recovered all of the secrets of the bootloader!

Tests

In [40]:
real_btldr_key = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99, \
                    0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1]

real_btldr_IV = [0xC1, 0x25, 0x68, 0xDF, 0xE7, 0xD3, 0x19, 0xDA, 0x10, 0xE2, 0x41, 0x71, 0x33, 0xB0, 0xEB, 0x3C]

real_btldr_sig = [0x00, 0xEB, 0x02, 0x1D]
In [41]:
assert (btldr_key == real_btldr_key), "Attack on encryption key failed!\nGot: {}\nExpected: {}".format(btldr_key, real_btldr_key)
In [42]:
assert (btldr_IV == real_btldr_IV), "Attack on IV failed!\nGot: {}\nExpected: {}".format(btldr_IV, real_btldr_IV)
In [43]:
assert (btldr_sig == real_btldr_sig), "Attack on signature failed!\nGot: {}\nExpected: {}".format(btldr_sig, real_btldr_sig)